black stone
TTT-Bench: A Benchmark for Evaluating Reasoning Ability with Simple and Novel Tic-Tac-Toe-style Games
Mishra, Prakamya, Liu, Jiang, Wu, Jialian, Yu, Xiaodong, Liu, Zicheng, Barsoum, Emad
Large reasoning models (LRMs) have demonstrated impressive reasoning capabilities across a broad range of tasks including Olympiad-level mathematical problems, indicating evidence of their complex reasoning abilities. While many reasoning benchmarks focus on the STEM domain, the ability of LRMs to reason correctly in broader task domains remains underexplored. In this work, we introduce \textbf{TTT-Bench}, a new benchmark that is designed to evaluate basic strategic, spatial, and logical reasoning abilities in LRMs through a suite of four two-player Tic-Tac-Toe-style games that humans can effortlessly solve from a young age. We propose a simple yet scalable programmatic approach for generating verifiable two-player game problems for TTT-Bench. Although these games are trivial for humans, they require reasoning about the intentions of the opponent, as well as the game board's spatial configurations, to ensure a win. We evaluate a diverse set of state-of-the-art LRMs, and \textbf{discover that the models that excel at hard math problems frequently fail at these simple reasoning games}. Further testing reveals that our evaluated reasoning models score on average $\downarrow$ 41\% \& $\downarrow$ 5\% lower on TTT-Bench compared to MATH 500 \& AIME 2024 respectively, with larger models achieving higher performance using shorter reasoning traces, where most of the models struggle on long-term strategic reasoning situations on simple and new TTT-Bench tasks.
Explaining How a Neural Network Play the Go Game and Let People Learn
Zhou, Huilin, Tang, Huijie, Li, Mingjie, Zhang, Hao, Liu, Zhenyu, Zhang, Quanshi
The AI model has surpassed human players in the game of Go [Fang et al., 2018, Granter et al., 2017, Intelligence, 2016], and it is widely believed that the AI model has encoded new knowledge about the Go game beyond human players. In this way, explaining the knowledge encoded by the AI model and using it to teach human players represent a promising-yet-challenging issue in explainable AI. To this end, mathematical supports are required to ensure that human players can learn accurate and verifiable knowledge, rather than specious intuitive analysis. Thus, in this paper, we extract interaction primitives between stones encoded by the value network for the Go game, so as to enable people to learn from the value network. Experiments show the effectiveness of our method.
World's top weiqi player Ke Jie loses third match against AlphaGo
The world's No.1 weiqi (Go) player Ke Jie lost the contest against his artificial intelligence (AI) rival, AlphaGo, in the third and also final match of the summit on Saturday. This match began at 10:30 BJT in Wuzhen, east China's Zhejiang Province, with AlphaGo playing the black and Ke white. Ke showed his brilliant weiqi skills as he said he will "fight till the end," though he lost his previous two matches against AlphaGo on Tuesday and Thursday. AlphaGo made the first "impolite" move as it did on Thursday โ to put the black stone on the bottom-right corner of the weiqi board. It is a Chinese tradition that the first stone is usually placed around the top-right corner and this is what weiqi coaches always teach beginners.
How Google's AlphaGo Beat a Go World Champion
Tonight, Lee Sedol is supported by one 33-year-old human brain and approximately 12 ounces of coffee. At its core, the game of Go, which originated in China more than 2,500 years ago, is an abstract war simulation. Players start with a completely blank board and place black and white stones, one at a time, to surround territory. Once placed, stones do not move, and they're removed only if they're "killed"--that is, surrounded completely by the opponent's stones. And so the game goes--black stone, white stone, black stone, white stone--until the board is covered in an intricate tapestry of black and white.